Purpose: Create interactive, geospatial visualizations with minimal python knowledge and packages. Visualizations will be built incrementally, culminating in a final visualization containing most of the tools describe beforehand.
NOTE:Before getting started, extract all files from the .zip file and add to the data folder. You can find the gihub repository for my work here.If you're working in Anaconda, you'll only need to install geopandas. Uncomment the lines starting in "!" to ensure you have all necessary packages.
## Anaconda installs
# Installs geopandas
# !conda install geopandas
# Updates all other packages
# !conda update --all
If you're not working on Anaconda, you'll likely need to install a few more packages.
## Non-Anaconda installs
# Installs geopandas
# !pip install geopandas
# !pip install bokeh
# !pip install pandas
# !pip install matplotlib
# Updates all other packages
# !conda update --all
Data: County-level statistics for the DC, Maryland, and Virginia Area.
## Package imports
# For handling data in general
import pandas as pd
# For importing shapefiles
import geopandas as gpd
# For converting dataframes to json files
import json
# For visualizations
import bokeh
## Data imports
# Read in the federal data
data = pd.read_csv("data/federal_data.csv", index_col = 0)
data.head()
| GEOID | year | units | units_sf | units_2_4 | units_mf | name | state | land_area | inequality_index | ... | private_hospitals | non_profit_hospitals | tribal_hospitals | exp_homelessness | votes_dem_percent | votes_rep_percent | votes_green_percent | votes_lib_percent | votes_other_percent | rural_level | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9315 | 11001 | 1990 | 368 | 180 | 180 | 162 | Washington | DC | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 |
| 9316 | 11001 | 1991 | 333 | 83 | 83 | 236 | Washington | DC | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9317 | 11001 | 1992 | 132 | 92 | 92 | 26 | Washington | DC | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9318 | 11001 | 1993 | 305 | 99 | 142 | 163 | Washington | DC | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9319 | 11001 | 1994 | 210 | 96 | 96 | 114 | Washington | DC | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 54 columns
These data are all publically available from a range of US federal agencies. A codebook for the features can be found in the data folder. For this tutorial, we will be using the Census's inequality index in 2019.
# Subset data to only GEOID, year, and inequality index and save as a new dataset
df = data[["GEOID", "year", "name", "state", "inequality_index"]]
# Remove rows missing inequality index and not in 2019
df = df.loc[(~df["inequality_index"].isna()) &
(df["year"] == 2019)]
Now we'll import the Census's county shapefiles.
# Read in the corresponding spatial data
counties_usa = gpd.read_file('data/shapefiles/census_counties.shp')
# Converts to Web Mercator Projection from latitude and longitude
counties_usa = counties_usa.to_crs("epsg:3857")
counties_usa.head()
| STATEFP | COUNTYFP | COUNTYNS | AFFGEOID | GEOID | NAME | LSAD | ALAND | AWATER | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 21 | 007 | 00516850 | 0500000US21007 | 21007 | Ballard | 06 | 639387454 | 69473325 | POLYGON ((-9927624.585 4445563.074, -9927403.6... |
| 1 | 21 | 017 | 00516855 | 0500000US21017 | 21017 | Bourbon | 06 | 750439351 | 4829777 | POLYGON ((-9400114.024 4619515.190, -9399944.3... |
| 2 | 21 | 031 | 00516862 | 0500000US21031 | 21031 | Butler | 06 | 1103571974 | 13943044 | POLYGON ((-9678657.320 4449343.722, -9678501.5... |
| 3 | 21 | 065 | 00516879 | 0500000US21065 | 21065 | Estill | 06 | 655509930 | 6516335 | POLYGON ((-9364932.278 4529453.689, -9364733.6... |
| 4 | 21 | 069 | 00516881 | 0500000US21069 | 21069 | Fleming | 06 | 902727151 | 7182793 | POLYGON ((-9349087.507 4642551.601, -9348884.6... |
Let's subset to DC, Maryland, and Virginia (FIPS 11, 24, and 51) to ensure this runs quickly.
# Subset the shapefile data to DC, Maryland, and Virginia
counties_usa = counties_usa.loc[counties_usa["STATEFP"].isin(["11", "24", "51"])]
# Subset to GEOID and geometry
counties_usa = counties_usa[["GEOID", "geometry"]]
# Convert GEOID to integers
counties_usa["GEOID"] = counties_usa["GEOID"].astype(int)
Finally, we'll merge the county shapes to the federal data, dropping GEOID and year. ***NOTE:*** The shapefile MUST be on the left in the merge.
# Merge in shapefiles
df = counties_usa.merge(df,
how = "left",
left_on = "GEOID", right_on = "GEOID")
df = df[["inequality_index", "name", "state", "geometry"]]
df.head()
| inequality_index | name | state | geometry | |
|---|---|---|---|---|
| 0 | 0.5269 | Washington | DC | POLYGON ((-8584932.302 4712271.130, -8584127.7... |
| 1 | 0.4133 | Anne Arundel | MD | POLYGON ((-8553829.970 4736456.566, -8553431.4... |
| 2 | 0.4100 | Augusta | VA | POLYGON ((-8853604.453 4601507.408, -8853543.1... |
| 3 | 0.4677 | Charlotte | VA | POLYGON ((-8783618.446 4442213.903, -8783554.3... |
| 4 | 0.6002 | Dickenson | VA | POLYGON ((-9189850.875 4467418.047, -9189465.3... |
Goal: Create and fine-tune a rudamentary Bokeh visualization.
# Convert to JSON format for plotting
from bokeh.models import GeoJSONDataSource
df_geo = GeoJSONDataSource(geojson =
df.to_json()) # "default_handler" ensures to_json can handle GEOID
# Ensures all plots are outputted to the notebook
from bokeh.io import output_notebook, show
output_notebook() # Very important! Run a plot before this and see what happens
from bokeh.plotting import figure
from bokeh.models import CategoricalColorMapper # For coloring counties by state
from bokeh.palettes import brewer # For selecting county colors
# Dark2 provides a qualitative, colorblind friendly color palette
# 3 specifies the number of categories
palette = brewer['Dark2'][3]
# Maps colors to states
mapper = CategoricalColorMapper(palette=palette,
factors=["DC", "MD", "VA"])
# Create figure object
p = figure(title = 'States: Categorical Color Mapper',
plot_height = 600 ,
plot_width = 950,
toolbar_location = 'below',
tools = "pan, wheel_zoom, box_zoom, reset, save")
# Remove axes and grids
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.axis.visible = False
# Add counties
counties = p.patches('xs','ys', source = df_geo,
fill_color = {"field" : "state",
"transform" : mapper},
line_color = "black",
line_width = 0.25,
fill_alpha = 1
)
show(p)
from bokeh.models import LinearColorMapper # For coloring counties by inequality index
palette_ineq = brewer['Reds'][8] # Select categorical colors
palette_ineq = palette_ineq[::-1] # Reverse order or palette
# Maps colors to states
mapper = LinearColorMapper(palette=palette_ineq,
low = min(df["inequality_index"]),
high = max(df["inequality_index"]))
# Create figure object
p = figure(title = 'Inequality: Linear Color Scale',
plot_height = 600 ,
plot_width = 950,
toolbar_location = 'below',
tools = "pan, wheel_zoom, box_zoom, reset, save")
# Remove axes and grids
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.axis.visible = False
# Add counties
counties = p.patches('xs','ys', source = df_geo,
fill_color = {"field" : "inequality_index",
"transform" : mapper},
line_color = "black",
line_width = 0.25,
fill_alpha = 1
)
show(p)
Goal: Explore more advanced tools in Bokeh, including interactive and geospatial components. Tools:
# Imports the hover tool
from bokeh.models import HoverTool
# Sets the hover tool up using data from the json file
hover = HoverTool(
tooltips=[
("County", "@name, @state"),
("Inequality", "@inequality_index"),
]
)
# Create figure object
p = figure(title = 'Inequality: Hover & Tap Tools',
plot_height = 600 ,
plot_width = 950,
toolbar_location = 'below',
tools = [hover, # Includes the hover and tap tool now
"pan, wheel_zoom, box_zoom, reset, save, tap"])
# Remove axes and grids
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.axis.visible = False
# Add counties
counties = p.patches('xs','ys', source = df_geo,
fill_color = {"field" : "inequality_index",
"transform" : mapper},
line_color = "black",
line_width = 0.25,
fill_alpha = 1
)
show(p)
from bokeh.layouts import column
from bokeh.models import CustomJS, ColumnDataSource, Slider, CustomJSFilter, CDSView, Column
# web mercator coordinates for DC, Maryland, and Virginia
DMV = x_range,y_range = ((-9.35*(10**6), -8.3*(10**6)), (4.3*(10**6), 4.9*(10**6)))
# Add the original data as a source for the slider
source = ColumnDataSource(data=df)
# Create figure object
p = figure(title = 'Inequality: Slider Tool',
plot_height = 600 ,
plot_width = 950,
x_range = x_range, # Set to maintain bounds as slider changes
y_range = y_range,
toolbar_location = 'below',
tools = [hover,
"pan, wheel_zoom, box_zoom, reset, save"])
# Create the inequality slider
slider = Slider(start=min(df["inequality_index"]),
end=max(df["inequality_index"]),
value=0, step=.01, title="Inequality")
# Triggers the filter as the slider changes
callback = CustomJS(args = dict(source=df_geo),
code = """source.change.emit();""")
slider.js_on_change('value', callback)
# Filters to counties at or above the sliders value
inequality_filter = CustomJSFilter(args = dict(slider = slider),
code = """
var indices = [];
// iterate through rows of data source and see if each satisfies some constraint
for (var i = 0; i < source.get_length(); i++){
if (source.data["inequality_index"][i] >= slider.value){
indices.push(true);
} else {
indices.push(false);
}
}
return indices;
""")
# A filter which determines counties below the slider value
inv_inequality_filter = CustomJSFilter(args = dict(slider = slider),
code = """
var indices = [];
// iterate through rows of data source and see if each satisfies some constraint
for (var i = 0; i < source.get_length(); i++){
if (source.data["inequality_index"][i] < slider.value){
indices.push(true);
} else {
indices.push(false);
}
}
return indices;
""")
# Uses filters to determine which set of counties are highlighted
view = CDSView(source = df_geo, filters = [inequality_filter])
inv_view = CDSView(source = df_geo, filters = [inv_inequality_filter])
# Remove axes and grids
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.axis.visible = False
# Set of counties with inequalities shown
counties = p.patches('xs','ys', source = df_geo,
fill_color = {"field" : "inequality_index",
"transform" : mapper},
line_color = "black",
line_width = 0.25,
fill_alpha = 1,
view = view)
# Set of counties greyed out
counties_grey = p.patches('xs','ys', source = df_geo,
fill_color = "grey",
line_color = "black",
line_width = 0.25,
fill_alpha = 0.1,
view = inv_view)
layout = column(p, Column(slider))
show(layout)
# Used to add the background map
from bokeh.models import WMTSTileSource
# Create figure object
p = figure(title = 'Inequality: Slider Tool with Basemap',
plot_height = 600 ,
plot_width = 950,
x_range = x_range,
y_range = y_range,
toolbar_location = 'below',
tools = [hover,
"pan, wheel_zoom, box_zoom, reset, save"],
x_axis_type="mercator", y_axis_type="mercator")
# Select where to acquire the basemap from
url = 'http://a.basemaps.cartocdn.com/rastertiles/voyager/{Z}/{X}/{Y}.png'
attribution = "Tiles by Carto, under CC BY 3.0. Data by OSM, under ODbL"
# Add the basemap
p.add_tile(WMTSTileSource(url=url, attribution=attribution))
# Remove axes and grids
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.axis.visible = False
# Set of counties with inequalities shown
counties = p.patches('xs','ys', source = df_geo,
fill_color = {"field" : "inequality_index",
"transform" : mapper},
line_color = "black",
line_width = 0.25,
fill_alpha = 0.4,
view = view)
# Set of counties greyed out
counties_grey = p.patches('xs','ys', source = df_geo,
fill_color = None,
line_color = "black",
line_width = 0.25,
fill_alpha = 0.05,
view = inv_view)
layout = column(p, Column(slider))
show(layout)
Goal: Storing and presenting interactive visualizations.
Individual plots may be stored as static images or interactive .html files. NOTE: Additional packages, such as selenium, must be installed to export the visualizations as static images.
# For static plots, use the following
from bokeh.io import export_png
#export_png(p, filename="dmv_inequality.png")
# For interactive plots
from bokeh.io import output_file, reset_output
#output_file("dmv_inequality.html")
#show(p)
To save an entire document of interactive visualizations, the simplest way is to create and export them from Jupyter Notebook. To do so, go to File -> Export Notebook As... -> HTML. NOTE: Exporting a notebook with markdown cells will cause the Bokeh visualizations to not display. Likewise, exporting to html.slides can sometimes cause visualization display issues as well.
For more tutorials, I highly recommend the following tutorials and resources, many of which were used to guide the construction of this tutorial: